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Abstract. A wide range of networks, including small-world topology, can be modelled by the connectivity 
7, and randomness u) of the links. Both learning and attractor abilities of a neural network can be measured 
by the mutual information (MI) , as a function of the load rate and overlap between patterns and retrieval 
states. We use MI to search for the optimal topology, for storage and attractor properties of the network. 
We find that, while the largest storage implies an optimal MI{"y, u) at ~f ovt {tjj) — > 0, the largest basin of 
attraction leads to an optimal topology at moderate levels of 7 op t, whenever < uo < 0.3. This y op t is 
related to the clustering and path-length of the network. We also build a diagram for the dynamical phases 
with random and local initial overlap, and show that very diluted networks lose their attractor ability. 
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1 Introduction 

The interest on attractor neural networks (ANN), origi- 
nally dealing with fully-connected architectures, has been 
renewed with the study of more realistic topologies 
Among them, the small- world (SW) graph [3], 0, mod- 
elled by only two parameters: 7 = K/N, the average rate 
of links, K, per network size, N\ and u>, which controls 
the rate of random links (among all K neighbors), can 
capture most facts of a wide range of networks jHj- The 
load rate a = Pj K (where P is the number of indepen- 
dent patterns), and the overlap m between neuron states 
and memorized patterns are the most used measures of 
the retrieval ability of the networks 6 . 

The overlap as a function of a is plotted in upper pan- 
els of Fig^ for fully-connected (FC, left panel), moderately- 
diluted (MD, central) and extremely-diluted (ED, right) 
networks. The FC network has a critical a F ~ 0.138 
with the overlap m FC ~ 0.97, and a sharp transi- 
tion to ni — ► for larger a > a c , where it fails to re- 
trieve, as seen in the left panel. However, for ED networks 
(K <C N), the transition is smooth. In particular, the ran- 
dom ED network (RED uj = 1.0, circles and dashed line) 
has af ED ~ 0.64.7J but the overlap falls continuously to 
m? ED ~0, 

Less attention has been paid to the study of the mutual 
information (MI) between stored patterns and the neural 
states [Hj; El- The lower panels of FigQ] display the infor- 
mation rate, i, evaluated from the conditional probability 
of neuron states <r given the patterns for the mean- 
field (MF) networks we deal with. The FC case shows a 
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critical information of about i FC ~ 0.132. The RED net- 
works has null information at a c , if ED ~ 0.0. However, if 
one look for the value of a ma x corresponding to the max- 
imal information i max = i(a max ), instead of a c , one finds 
i*ax ^ 0-223 for a£ff ~ 0.32. The FC network has the 
same = if c ■ 

We address the problem of searching the topology which 
maximizes the MI. Using the graph framework, we built 
networks with the parameters: connectivity rate, 7, run- 
ning from the FC (7 = 1) to the ED (7 — > networks; 
and randomness, u), ranging from local (u> = 0) to ran- 
dom (u> = 1) neighbors. Diluted topologies with ui ~ 
0.1, with large clustering coefficient (C) and small mean- 
path-lcngth (L) between neurons, so-called small-world 
(SW), arc rather uscfull when one needs fast and robust to 
noise information transmition, without spending too much 
wiring [2], SW networks may model many biological sys- 
tems, for instance, in a brain local connections dominate 
in intracortex, while there are a few intercortical connec- 
tions jug. 

The right panels of Fig^plot also m and i(a) for a 
SW ED network (SED, w = 0.2), with = 0.165, and 
for the local ED network (LED, u = 0.0), with = 
0.0855, it shows how the information increases with ran- 
domness oj. The central panel of Fig ^ plot MD networks. 
Comparing different dilution levels, one see that i increases 
(decreases) with 7 for local (random) networks, and re- 
mains about the same for SW topologies. A question arises 
about the optimal topology: if the randomness u> is fixed 
(by physical constraints) , which is the best connectivity 7? 
To our knowledge, up to now, no previous answer to this 
question were known. We approach this problem from two 
scenarios: the stability and the retrieval attractor. We will 
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Fig. 1. T/ie overlap m and the information i vs a for different architectures: fully- connected, ^ FC — 1.0 (left), moderately- 
diluted, "f MD = 10 -2 (center) and extremely-diluted, j ED — 10~ 4 (right). Symbols represents simulation with initial overlap 
m° — 1 and \J\ — 40M, with local (stars, lu — 0.0), small-world (filled squares, u) = 0.2), and random (circles, lu = 1.0 ^ 
connections. Lines are for theoretical results: solid, u) = 0.0, dotted, lu = 0.2, and dashed, lu — 1.0. In left, dashed line means 
averaging the simulation. 



show that, concerning the stability of a pattern, the RED 
network performs the best, j op t — * 0. However, regarding 
the attractor basins, the optimal topology holds for MD, 
for instance, lu ~ 0.1 leads to an optimal "fopt ~ 

io- 2 . 

The structure of the paper is the following: in the sec- 
tion 2, we define the topology and neural-dynamics model, 
and review the information measures used in the calcula- 
tions. The results are shown in Sec. 3, where we study re- 
trieval by theory and simulation. We present a diagram for 
the phases with local and random initial conditions, and 
show a relation between topology and MI. Conclusions are 
drawn in last section. 



2 The Model 

2.1 Topology and Dynamics 

The synaptic couplings are Jy = CijWij, where C is the 
topology matrix C and in W are the learning weights. 
The topology splits in local and random links, {CV, = 
C- • + C%j}. The local part connects the Ki nearest neigh- 
bors, Clj = J2kev ~ j~ k)j witn ^ = {1) m tne 
asymmetric case, in a closed ring. The random part con- 
sists of independent random variables {CL}, distributed 



with probability p(Cy = 1) = c r , and C\- = other- 
wise, with c r = K r /N, where K r is the mean number of 
random connections of a single neuron. Hence, the neu- 
ron connectivity is K = Ki + K r . The network topology 
is then characterized by two parameters: the connectiv- 
ity ratio, defined as 7 = K/N, and the randomness ratio, 
lu = K r /K. The symmetry constraints seems to play only 
side effects on the information properties. The to plays 
the role of rewiring probability in the small-world model 
(SW) [S] . Our model was proposed by Newman and Watts 
|1 1 j - which has the advantage of avoiding disconnecting 
the graph. 

Note that the topology C can be defined by an ad- 
jacency list connecting neighbors, ik,k — l,...,K, with 
Cij = 1 : j — ife. So the storage cost of this network is 
|J| = N-K. The learning algorithm updates W, according 
to the Hebb rule 

wr^w^ + gq. (i) 

The network starts at Wfj — 0, and after /j, = P = aK 
learning steps, it reaches a value Wij = Yl^i^j- The 
learning stage is a slow dynamics, being stationary in the 
time scale of the much faster retrieval stage, we define in 
the following. 
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The neural states, a\ S {±1}, are updated according 
to the dynamics: 



= sign(^), h\ 



i = 1...N 



(2) 



In the case of symmetric synaptic couplings, Jy = Jji, 
an energy function H s = — Y^,u j) Jij<Ji<Jj can be defined, 
whose minima are the stable states of the dynamics Eq. . 

In the present paper, we work out the asymmetric 
network by simulation (no constraints J+j = Jji). The 
theory was carried out for symmetric networks. Biologi- 
cal networks are usually asymmetric ^H] , but this feature 
does not allow any thermodynamics approach. As it is 
seen in Fig^ theory and simulation shows similar results, 
except for local networks (theory underestimate a max ), 
where the symmetry may play some role. A stochastic 
macro-dynamics takes place due to the extensive learning 
of P = aK patterns. 



2.2 The Information Measures 

The network state at a given time t is defined by a set of 
binary neurons, er* = {a\ £ {±l},i — 1,...,N}. Accord- 
ingly, each pattern £ M = g {±l},z = l,...,iV}, is a 
set of site-independent random variables, binary and uni- 
formly distributed: p(£f = ±1) = 1/2. The network learns 
a set of independent patterns , /i = 1, P}, 

The task of the neural channel is to retrieve a pattern 
(say, £ = starting from a neuron state er° which is 
inside its attractor basin. This is achieved through a net- 
work dynamics coupling neurons through a synaptic ma- 
trix 3 = {Ji.j} with cardinality |J| — Nx K. The relevant 
order parameter is the overlap between the neural states 
and the pattern: 



171% 



1 

N 



(3) 



at the time step t. Together with the overlap, one needs a 
measure of the load, which is the rate of pattern bits per 
synapses used to store them. Since the synapses and pat- 
terns are independent, the load is given by a = |{£ M }|/| J| = 
P/K. 

We require the interactions J to be long-range, and 
neglect spatial correlation. Hence, we regard a mean-field 
network (MFN), the distribution of the states is assumed 
to be site-independent. Therefore, according to the law of 
large numbers, the overlap can be written, for K, N — > oo, 
as m f = (<7*£)(t,£- The brackets represent average over the 
joint distribution p(a, £), for a single neuron, understood 
as an ensemble distribution for the neuron states {o~i} and 
pattern {^} 0. 

This distribution factorizes in the conditional proba- 
bility p{o-\t) = (1 + m<r^)5{a 2 — 1), [T3] and input prob- 
ability p(£). In p(a\£,), all types of noise in the retrieval 
process are enclosed (both from environment and over the 
dynamical process itself) . With the above expressions and 
p(a) = 5 (a 2 — 1), we can calculate the MI jH], a quantity 



used to measure the prediction that an observer at the 
output (<r) can do about the input (we drop the time 
index t). It reads MI[a; £] = S[cr] — S[a\£\, where the out- 
put and conditional entropies are given (in bits) by 1 1 3 j : 

_ r , ., 1 + m . 1 + m 1-m. 1 — m 
= — lo §2 — g — lo S2 — g— . 

S[cr] = l[bit\. (4) 

We define the information rate as 

i(a,m) = MI[*\{&}]/\J\ = aMI[a;i), (5) 

since for independent neurons and patterns, A//[er|{£/i}] = 
J2iu MI[cri\£i]- The information is i — aMI, Eq.©, where 
the load rate is scaled as a = P/K. 

When the network approaches its saturation limit a c , 
the neuron states can not remain close to the patterns, 
then m c is usually small. So, while the number of patterns 
increases, the information per pattern decreases. There- 
fore, information i(a,m) is a non-monotonic function of 
the overlap and load rate (see Fig^l, which reaches its 
maximum value i m ax = i(ct max ) at some value a max < a c 
of the load. 



3 Results 

We studied the information for the stationary and dynam- 
ical states of the network, as a function of the topological 
parameters, oj and 7. A sample of the results for simula- 
tion and theory is shown in Fig^ where the stationary 
states of the overlap and information are plotted for the 
FC, MD and ED architectures. It can be seen that the 
information increases with dilution and with randomness 
of the network. A reason for this behavior is that dilution 
decreases the correlation due to the interference between 
patterns. However, dilution also increases the mean-path- 
length of the network, thus, if the connections are local, 
the information flows slowly over the network. Hence, the 
neuron states can be eventually trapped in noisy patterns. 
So, i m ax is small for u> ~ even if 7 — 10~ 4 . 



3.1 Theory: Storage 

The theoretical approach follows the Gardner calculations [7]. 
A supposition is that the network state is near a given pat- 
tern. At temperature T=0 the MF approximation gives 
the fixed point equations: 



m = erf(m/v / ro), 

X = 2(p(m/ '\fra)l 'y/ra; 



= 5> fe (fc + l)x fc , a k = 7lY[(C/X) fc 



k+2] 



(6) 
(7) 

(8) 



k=0 



with erf (x) = 2 ip(z)dz, (p(z) 



72, 



/V27T. The pa- 



rameter ctfc is the probability of existence of cycle of length 
k + 2 in the graph C. The a k can be calculated either 
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Theory x Simulation 
(Stability of m>0 x m =0.1, t=50) 




Theory: 
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Fig. 2. Maximal information i m ax = i{ot max ) vs 7. Theory for 
the stationary states (solid lines), and simulations with m° = 
0.1 (dashed lines), with several values of randomness ui. 



by using Monte Carlo |14j . or by an analytical approach, 
which gives au ~ S m / d0[p{6)] k e tmS , where p{9) is the 
Fourier transform of the probability of links, p(Cij). For 
an RED and FC networks one recover the known results 
for r RED = 1 and r = 1/(1 — x) 2 respectively £Q. 



The theoretical dependence of the information on the 
load, for FC, MD and ED networks, with local, small- 
world and random connections, are plotted in the solid 
lines in Fig^ A comparison between theory and simula- 
tion is also given in Fig^ It can be seen that both results 
agree for most uj > 0, but theory fails for ui = 0. One rea- 
son is that theory uses symmetric constraint, while simu- 
lation was carried out with asymmetric synapsis. The solid 
lines in Fig0shows their maxima i ma x{l, w) = i(a 
vs. the parameter ui, varying 7. It is seen that thermody- 
namical optimum i{^ pt) is at ui op t — * 1,7 — * 0. This im- 
plies that the best topology for information, respect to the 
stationary states, is the RED network. It is worth to note 
that the simulation converges to the theoretical results if 
mo = 1.0 when t — ► 00, this means that theory accounts 
for the storage capacity of the network. However, quite dif- 
ferent qualitative behavior holds for the simulation with 
low rn = 0.1, see Fig|21 displays optima i("f op t) for MD 
topologies. 



3.2 Simulation: Attractors 

The theoretical equations for the stationary states, Eqs.©, 
account only for the existence of the retrieval (R) solution 
to > 0. However, they say nothing about its stability. The 
zero states (Z), m — 0, are also a solution of Eqs.JSJ, so 
both R and Z may coexist in some region of topological 
parameters 7, uj. In order to study the stability of the at- 
tractors, we simulated Eq.Q, and check how the network 
behaves for different initial conditions. 

Both local and random connections are asymmetric. 
The simulation was carried out with N x K — 36 • 10 6 
synapses, storing an adjacency list as data structure, in- 
stead of Jij. For instance, with 7 = K/N = 0.01, we used 
K = 600, N = 6- 10 4 . In [T5j the authors use K = 50, N = 
5 ■ 10 3 , which is far from asymptotic limit. We averaged 
over a window in the axis of P. 

To check for the attractor properties of the retrieval, 
the neuron states start far from a learned pattern, but 
inside its basin of attraction, er° G B(^). First, we choose 
an initial configuration given by a random correlation with 
patterns, p(a° — ±^ l \^) — (l±m°)/2, for all neurons (so 
we avoid a bias between local/random neighbors). We call 
this the tojj initial overlap. The retrieval dynamics starts 
with an overlap m° = 0.2, and stops after tf = 50 steps 
(unless it converges to a fixed point m* before t = tf). 
Usually, tf = 20 parallel (all neurons) updates is a large 
enough delay for retrieval. The information i(pt, to; 7, ui) is 
calculated. We averaged over a window in the axis of P, 
usually SP = 25. 

The results are depicted in the dashed curves of Fig [3 
where the imaxil^) are plotted against 7. One see that, 
unlike the theoretical results for the storage capacity, there 
are MD topologies which performs the best when the at- 
tractor properties are considered. Starting with too = 0.1 
yields optima i(jopt) for moderate dilutions, for instance, 
with ui = 0.2, it holds 7 op t ~ 10~ 2 . 

Next, we compare rriR with another type of initial 
distribution. The neurons start with local correlations: 



1, (Nm°), and random of otherwise. We 



call it the mi initial overlap. The results are shown in 
the FigO In the lower (upper) panels we see the behavior 
with the ran (j^l) initial overlap. The first observation 
now is that the maxima information i m ax{l\^i) increases 
with dilution (smaller 7) if the network is more random, 
ui ~ 1, while it decreases with dilution if the network is 
more local, ui ~ 0. However, there is a moderate 7 op t for 
which the information i("/ pt) is optimized. For instance 
with ui = 0.1, starting with initial overlap mn, the op- 
timum is i(7opt) ~ 0.148 at 7 opt = 10 -2 . For mi, the 
optimum is i{j op t) ~ 0.138 at 7 op t = 10~ 2 . We see that 
the initial ma allows for an easier retrieval for any w, but 
local topologies (u — 0) are very sensitive to the type 
of initial overlap, and lose their retrieval abilities if the 
connectivity is ^ opt < 10~ 2 . We also observe in Fig|3] a 
feature of the mi condition: the network improves its re- 
trieval ability with learning (to increases with a) before 
the information reaches its maxima, which resembles a 
stochastic resonance effect. 
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Fig. 4. Diagram (u x 7) with the phases R and L, for initial 
overlap mo = 0.2. 



The comparison between upper (rn£) and lower (trr) 
panels of FigEl shows that the non-monotonic behavior 
of the information with dilution, is stronger for the lo- 



cal than for the random initial overlap. This sensitivity to 
the initial conditions can be understood in terms of the 
basins of attraction. Random topologies have very deep at- 
tractors, specially if the network is diluted enough, while 
regular topologies almost lose their retrieval abilities with 
dilution. However, since the basins becomes rougher with 
dilution, then network takes longer to reach the attrac- 
tor, and can be trapped in metastable states. Hence, the 
competition between depth-roughness is won by the more 
robust MD networks. 

The retrieval capability of the network when start at 
condition tur or hul is plotted in Fig0] We represent as 
R the phase where the retrieval reaches at least the infor- 
mation imax — 0.05, starting from mo = 0.2, with rriR. 
The phase L is the same, but starting with mi. Good re- 
trieval (i m ax > 0.05) is not allowed with the mi condition 
neither for very connected nor for local diluted topologies. 



3.3 Clustering and Mean-Length-Path 

Wc described here the topological features of the network, 
as a function of its parameters, the clustering coefficient, 
c, and the mean-length-path between neurons, I. When 7 
is large, the net has c large, c ~ 1 and I small, I = 0(1), 
whatever u> used. When 7 is small, then if u> ~ 0, the net 
is clustered (c = 0(7)), and has large paths (I ~ N/K); 
if uj ~ 1, the net becomes random (c <C 1 and I ~ IniV). 
However, if the randomness is about uj ~ 0.1, then c = 
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5. The maxima information i(a max ) (bottom), clustering coefficient c and mean-path-length I (top) vs u), for 7 = 
ICT 2 , 1CT 3 (from left to right). Simulations with N.K = 10M. 



0(7) but with / ~ In AT, and the network behaves as a 
small- world (SW): clustered but with short paths. 

The dependence of c, I and i with randomness u> are 
plotted in Fig0 In all panels, for connectivity 7 = 1CP 1 , 
10 -2 , 10 -3 , we see a decrease of c and I and an increase 
of i with lo. For these range of to, the path-length I has 
already decreased to small values, and the networks have 
entered the SW region. However, in the right panels, there 
is still some slow down of I. The clustering c decreases fast 
around lo = 0.2, after which the network is random-like. 

This region 0.001 < w < 0.2 is the SW regime for 
the network we study. On the other hand, looking at the 
bottom panel, we see that at the end of the SW graph, 
between 0.05 < u> < 0.20 there is a fast increase of i. We 
conjecture that after the SW region, a further increase of 
the randomness w does not worth its wiring cost to gain 
a little extra information i. 



4 Conclusions 

We have discussed the behavior of the information capac- 
ity of an attractor neural network with the topology. We 
calculated the mutual information for a Hopfield model, 
with Hebbian learning, varying the connectivity (7) and 
randomness (lo) parameters, and obtained the maximal 
respect to a, i ma x{l,u) = i{ct ma ,x\l,u). The informa- 
tion i ma x always increase with to, but for a fixed lo, an 
optimal topology 7 op t, in the sense of the information, 



(j opt ,Lo), can be found. We presented station- 
ary and attractor states. 

From the stability calculations, the optimal topology 
respect to the storage, is the random extremely diluted 
(RED) network. Indeed, if no pattern completion is re- 
quired, they can be stored statically, and the best way is 
without any connectivity. For retrieval dynamics, however, 
this is not true: we found there is an intermediate j pt, for 
any fixed < lo < 0.3. Moreover, local diluted (LD) net- 
works are even more damaged if they starts with local 
overlap than if the initial overlap is random. This can be 
understood regarding the shape of the attractors. The ED 
waits much longer for the retrieval than more connected 
networks do, so the neurons can be trapped in spurious 
states with vanishing information. We found there is an 
intermediate optimal j op t, whenever < lo < 0.3. 

We found a relation between the fast increase of infor- 
mation with lo and the region of small-world of the topol- 
ogy. This implies that it worth the wiring cost to stay at 
the end of the SW zone. Both in nature and in technolog- 
ical approaches to neural devices, dynamics is an essential 
issue for information process. So, an optimized topology 
holds in any practical purpose, even if no attcmption is 
payed to wiring or other energetic costs of random links 
|12j . The reason for the intermediate 7 op t is a competi- 
tion between the broadness (larger storage capacity) and 
roughness (slower retrieval speed) of the attraction basins. 
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We believe that the maximization of information re- 
spect to the topology could be a biological criterion (where 
non-equilibrium phenomena are relevant) to build real neu- 
ral networks. We expect that the same dependence should 
happens for more structured networks and learning rules. 
More complex initial conditions may also play some rule 
in the retrieval and it worth a dedicated study. 
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